Resizegrad
=================

计算 Resize 操作的梯度。该算子包含两种实现：最近邻插值梯度（ResizeNearestNeighborGrad）和双线性插值梯度（ResizeBiLinearGrad）。用于将上游梯度从输出尺寸反向传播到输入尺寸。

最近邻插值梯度（ResizeNearestNeighborGrad）


对于最近邻插值，输入梯度通过最近邻映射反向传播到输出梯度。

.. math::

    out_y =
    \begin{cases}
        \mathrm{round}(in_y \cdot height\_scale), & \text{if } align\_corners \\
        \lfloor in_y \cdot height\_scale \rfloor, & \text{otherwise}
    \end{cases}

.. math::

    out_x =
    \begin{cases}
        \mathrm{round}(in_x \cdot width\_scale), & \text{if } align\_corners \\
        \lfloor in_x \cdot width\_scale \rfloor, & \text{otherwise}
    \end{cases}

.. math::

    out\_addr[out\_offset] \mathrel{+}= in\_addr[in\_offset]

其中 `in_addr` 是上游梯度（输出尺寸），`out_addr` 是输入梯度（输入尺寸）。注意使用累加操作（+=），因为多个输入位置可能映射到同一个输出位置。

双线性插值梯度（ResizeBiLinearGrad）


对于双线性插值，输入梯度通过双线性插值的反向传播分配到4个相邻的输出位置。

.. math::

    in_y = h \cdot height\_scale

.. math::

    top_y = \max(\lfloor in_y \rfloor, 0)

.. math::

    bottom_y = \min(\lceil in_y \rceil, out\_height - 1)

.. math::

    y_{\mathrm{lerp}} = in_y - \lfloor in_y \rfloor

.. math::

    inverse\_y\_lerp = 1.0 - y_{\mathrm{lerp}}

对于 x 维度有类似的公式：

.. math::

    in_x = w \cdot width\_scale

.. math::

    left_x = \max(\lfloor in_x \rfloor, 0)

.. math::

    right_x = \min(\lceil in_x \rceil, out\_width - 1)

.. math::

    x_{\mathrm{lerp}} = in_x - \lfloor in_x \rfloor

.. math::

    inverse\_x\_lerp = 1.0 - x_{\mathrm{lerp}}

输入梯度按权重分配到4个相邻位置：

.. math::

    out\_addr[top_y, left_x] \mathrel{+}= in\_addr[h, w] \cdot
    (inverse\_y\_lerp \cdot inverse\_x\_lerp)

.. math::

    out\_addr[top_y, right_x] \mathrel{+}= in\_addr[h, w] \cdot
    (inverse\_y\_lerp \cdot x_{\mathrm{lerp}})

.. math::

    out\_addr[bottom_y, left_x] \mathrel{+}= in\_addr[h, w] \cdot
    (y_{\mathrm{lerp}} \cdot inverse\_x\_lerp)

.. math::

    out\_addr[bottom_y, right_x] \mathrel{+}= in\_addr[h, w] \cdot
    (y_{\mathrm{lerp}} \cdot x_{\mathrm{lerp}})

输入：
    - **in_addr** - 指向上游梯度数据的指针（输出尺寸的梯度）。
    - **out_addr** - 指向输出梯度数据的指针（输入尺寸的梯度），需要初始化为0。
    - **batch_size** - 批次大小。
    - **channel** - 通道数。
    - **format** - 数据格式，0 表示 NHWC，1 表示 NCHW。
    - **align_corners** - 是否对齐角点标志，0 表示不对齐，1 表示对齐。
    - **in_height** - 输入高度。
    - **in_width** - 输入宽度。
    - **out_height** - 输出高度。
    - **out_width** - 输出宽度。
    - **height_scale** - 高度缩放因子，通常为 (out_height - 1) / (in_height - 1) 或 out_height / in_height。
    - **width_scale** - 宽度缩放因子，通常为 (out_width - 1) / (in_width - 1) 或 out_width / in_width。

输出：
    - **out_addr** - 计算后的输入梯度（输入尺寸的梯度）。

支持平台：
    ``FT78NE``
    ``MT7004``

.. note::
    - FT78NE 支持 fp32
    - MT7004 支持 fp16, fp32


**共享存储版本:**

.. c:function:: void fp_resizenearestneighborgrad_s(float* in_addr, float* out_addr, int batch_size, int channel, int format, int align_corners, int in_height, int in_width, int out_height, int out_width, float height_scale, float width_scale, int core_mask)
.. c:function:: void hp_resizenearestneighborgrad_s(half* in_addr, half* out_addr, int batch_size, int channel, int format, int align_corners, int in_height, int in_width, int out_height, int out_width, float height_scale, float width_scale, int core_mask)
.. c:function:: void fp_resizebilineargrad_s(float* in_addr, float* out_addr, int batch_size, int channel, int format, int align_corners, int in_height, int in_width, int out_height, int out_width, float height_scale, float width_scale, int core_mask)
.. c:function:: void hp_resizebilineargrad_s(half* in_addr, half* out_addr, int batch_size, int channel, int format, int align_corners, int in_height, int in_width, int out_height, int out_width, float height_scale, float width_scale, int core_mask)

**C调用示例（最近邻插值梯度）:**

.. code-block:: c
    :linenos:
    :emphasize-lines: 22-24

    #include <stdio.h>
    #include <resizegrad.h>

    int main(int argc, char* argv[]) {
        float *in_addr = (float *)0x10010000;
        float *out_addr = (float *)0x10020000;

        int batch_size = 4;
        int channel = 3;
        int format = 1;
        int align_corners = 1;
        int in_height = 4, in_width = 4;
        int out_height = 6, out_width = 6;
        int core_mask = 0xff;

        float height_scale = (float)(out_height - 1) / (in_height - 1);
        float width_scale = (float)(out_width - 1) / (in_width - 1);

        int output_size = in_height * in_width * channel * batch_size;
        memset(out_addr, 0, output_size * sizeof(float));

        fp_resizenearestneighborgrad_s(in_addr, out_addr, batch_size, channel, 
                                        format, align_corners, in_height, in_width, 
                                        out_height, out_width, height_scale, width_scale, core_mask);
        return 0;
    }

**私有存储版本:**

.. c:function:: void fp_resizenearestneighborgrad_p(float* in_addr, float* out_addr, int batch_size, int channel, int format, int align_corners, int in_height, int in_width, int out_height, int out_width, float height_scale, float width_scale)
.. c:function:: void hp_resizenearestneighborgrad_p(half* in_addr, half* out_addr, int batch_size, int channel, int format, int align_corners, int in_height, int in_width, int out_height, int out_width, float height_scale, float width_scale)
.. c:function:: void fp_resizebilineargrad_p(float* in_addr, float* out_addr, int batch_size, int channel, int format, int align_corners, int in_height, int in_width, int out_height, int out_width, float height_scale, float width_scale)
.. c:function:: void hp_resizebilineargrad_p(half* in_addr, half* out_addr, int batch_size, int channel, int format, int align_corners, int in_height, int in_width, int out_height, int out_width, float height_scale, float width_scale)

**C调用示例（双线性插值梯度）:**

.. code-block:: c
    :linenos:
    :emphasize-lines: 21-23

    #include <stdio.h>
    #include <resizegrad.h>

    int main(int argc, char* argv[]) {
        float *in_addr = (float *)0x10010000;
        float *out_addr = (float *)0x10020000;

        int batch_size = 4;
        int channel = 3;
        int format = 0;
        int align_corners = 0;
        int in_height = 4, in_width = 4;
        int out_height = 6, out_width = 6;

        float height_scale = (float)out_height / in_height;
        float width_scale = (float)out_width / in_width;

        int output_size = in_height * in_width * channel * batch_size;
        memset(out_addr, 0, output_size * sizeof(float));

        fp_resizebilineargrad_p(in_addr, out_addr, batch_size, channel, 
                                 format, align_corners, in_height, in_width, 
                                 out_height, out_width, height_scale, width_scale);
        return 0;
    }